-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine concurrency logic #5063
Refine concurrency logic #5063
Conversation
We need to make sure we're not using so many cores that the tests are starved of memory. Signed-off-by: Adam Farley <[email protected]>
Instead of what it does right now, which is to compare bytes with kilobytes, and incorrectly assume that the kilobytes number is always smaller. Signed-off-by: Adam Farley <[email protected]>
Linked to #5012 May not close it. Further review required. Should improve matters, though. |
Testing here: https://ci.adoptium.net/job/Grinder/8831/console |
Yep, that worked. Now sets concurrency to a more sensible level. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sys/fs/cgroup/memory/memory.limit_in_bytes
is inherently cgv1 (legacy) specific. This won't work on cgroup v2 systems. FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamfarley Because I'm paranoid about these things can you fire this off on a subset of diverse machines similar to the table at the top of #4933 to check that this won't regress anything please?
Suggest one of the PLCT RISC-V machines (4 core, 16Gb), test-sxa-armv7l-ubuntu2004-odroid-2 (8 core, 2Gb), something running on an Ubuntu 2004 host and something on Ubuntu 22.04 as a minimum.
Gotcha. Will run with the updated fix. |
Ok, given that we (a) want to tolerate both memory.max AND memory.etcetcbytes, and (b) that memory.max can be both present, readable and empty, here's a proposal for the new script:
I added formatting for an easier review. What do you think? |
P.S. If I ever complain I have too much spare time, remind me of this makefile. Turning it into a bash script would remove the need to put all of this script into a single line. |
In cgroup v1 we used memory.limit_in_bytes to store the maximum memory allocated to the container. In v2, we use memory.max. This change allows us to check for both, includes a meminfo check for non-containers, and adds a few debug comments so we can be sure this new code is working. My plan is to run tests on a diverse set of machines, and to remove the debug statements prior to merging. Signed-off-by: Adam Farley <[email protected]>
Test run: https://ci.adoptium.net/job/Grinder/8832/ Will check across other machines once I know the new code works. |
You should test on the FS type on the cgroup root. In pseudo code:
|
Ok, had a ponder and came up with this:
This way we:
Added some debug code (so we know we're following the right code path) and ran to test it here: https://ci.adoptium.net/job/Grinder/8836/ |
To ensure we're looking in the correct file for the maximum memory size. Also to handle permissions issues, empty files, etc. Signed-off-by: Adam Farley <[email protected]>
Signed-off-by: Adam Farley <[email protected]>
Means you assume |
Do you think that's going to trap additional situations compared to just checking for the presence of memory.max? Given where this code is (on one line for now) my preference would probably be to just check one or the other given that this is already quite complex for one line unless we need both checks.
We should avoid mixing
Is that check even necessary? |
Taking into consideration Stewart and Severin's points, I'm testing this:
This removes the cgroup version check, but also removes the need for it (as we now have error-handling). One fewer man-page check (for stat) and one fewer "if" layer, to make debugging easier. |
Makes it easier to maintain, and more resilient to failure. I've also added a commented, formatted copy of the script to help people read it in the future. Signed-off-by: Adam Farley <[email protected]>
Signed-off-by: Adam Farley <[email protected]>
Also, I added the formatted copy of the script into a big comment, to make maintenance easier in the future. |
Signed-off-by: Adam Farley <[email protected]>
Ok, typo fixed and code works. All debug comments removed, and tested here: https://ci.adoptium.net/job/Grinder/8850/console Requesting merge now. |
This change ensures that we identify the correct memory size when running tests in containers, by setting CGMEM to be the number of memory bytes in all circumstances.
We could remove the "-lt" logic altogether, but we shouldn't because sometimes the cgroup limits are set far beyond the machine's maximum memory, for reasons we do not yet understand.